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EXAMINER'S ANSWER 



This is in response to the appeal brief filed 05/26/2006 appealing from the Office action mailed 
09/20/2005. 
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(1) Real Party in Interest 

A statement identifying by name the real party in interest is contained in the brief. 

(2) Related Appeals and Interferences 

The examiner is not aware of any related appeals, interferences, or judicial proceedings 
which will directly affect or be directly affected by or have a bearing on the Board's decision in 
the pending appeal. 

(3) Status of Claims 

The statement of the status of claims contained in the brief is correct. 

(4) Status of Amendments After Final 

The appellant's statement of the status of amendments after final rejection contained in 
the brief is correct. 

(5) Summary of Claimed Subject Matter 

The summary of claimed subject matter contained in the brief is correct. 

(6) Grounds of Rejection to be Reviewed on Appeal 

The appellant's statement of the grounds of rejection to be reviewed on appeal is correct. 



i 
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(7) Claims Appendix 

A substantially correct copy of appealed claims 8-30 appears on pages 19-22 of the 
Appendix to the appellant's brief The minor errors are as follows: 

Re claims 8-9 5 the status identifier for these claims should be "Previously Amended" in 
placed of "Rejected". 

Re claims 10-30, the status identifier for these claims should be "Original" in placed of 
"Rejected". 



(8) Evidence Relied Upon 

No evidence is relied upon by the examiner in the rejection of the claims under appeal. 
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(9) Grounds of Rejection 

The following ground(s) of rejection are applicable to the appealed claims: 

Claims 8-9, 15-16, and 19-23 were rejected under 35 U.S.C. § 102(b) as being 
anticipated by Chip et al. ("The Core ware Methodoloty: building a 200 Mflop processor 
in 9 man month's"). 

Claims 10-11 were rejected under 35 U.S.C § 103(a) as being unpatentable over 
Chip et al. ("The Coreware Methodoloty: building a 200 Mflop processor in 9 man 
month's") in view of Debabrata et al. ("A 600 MHz half-bit level pipelined accumulator- 
interleaved multiplier accumulator core"). 

Claims 12-14, 17-18, and 24-30 were rejected under 35 U.S.C. § 103(a) as being 
unpatentable over Chip et al. ("The Coreware Methodoloty: building a 200 Mflop 
processor in 9 man month's") in view of Choquette (U.S. 6,480,872). 

DETAILED ACTION 
Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on 
sale in this country, more than one year prior to the date of application for patent in the United States. 

Claims 8-9, 15-16, and 19-23 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Chip et al. ("The Coreware Methodology: building a 200 Mflop processor in 9 man months"). 
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Re claim 8, Chip et al. disclose in Figures 2 and 4 an integrated circuit (e.g. 
Figure 4) comprising: a multiplier (e.g. FMUL as floating-point multiplier in Figure 2) 
coupled to receive interleaved operands (e.g. a, e, i, and m, as the interleaved operands in 
Figure 2 wherein a, e, i 5 and m are interleaved selected in column style of matrix 4x4 in 
equation 1 page 549) and to produce a product (e.g. output of FMUL in Figure 2), and a 
multi-threaded accumulator (e.g. FADD as floating-point adder in Figure 2 and page 550 
right column lines 8-12) coupled to the multiplier to receive the product (e.g. the input of 
FADD is connected to the output of FMUL in Figure 2). 

Re claim 9 3 Chip et al, further disclose in Figures 2 and 4 a control circuit to 
interleave input interleaved operands from different operand streams into the multiplier 
(e.g. Opcode circuit in Figure 4). 

Re claim 15, Chip et al. further disclose in Figures 2 and 4 the integrated circuit 
(e.g. Figure 4) is a circuit selected from the group comprising a processor (e.g. abstract), 
a memory (e.g. register files in Figure 4), a memory controller (e.g. Opcode control in 
Figure 4), an application specific integrated circuit, and a communications device (e.g. 
page 549 left column under graphic processor). 

Re claim 16, Chip et al. disclose in Figures 2 and 4 an accumulator circuit (e.g. 
FADD in Figure 2) to accept operands from different threads interleaved in time (e.g. 
page 550 lines 9-13 right column), the accumulator having intermediate registers to 
simultaneously hold partial results from each of the different threads (e.g. Figure 2 
wherein the FADD holds 4 separate registers for xt, yt, zt, and wt respectively). 
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Re claim 19, Chip et al. further disclose in Figures 2 and 4 the operands are 
floating point numbers in EEEE single precision format (e.g. page 551 line 5 under 
conclusion section). 

Re claim 20, Chip et al. further disclose in Figures 2 and 4 the operands are 
floating point numbers in a floating point format other than IEEE single precision format 
(e.g. page 551 line 5 under conclusion section). 

Re claim 21, Chip et al. further disclose in Figures 2 and 4 the floating point 
numbers include exponent fields with a least significant bit weight other than one (e.g. 
page 550 lines 4-5 in right column). 

Re claim 22, Chip et al. further disclose in Figures 2 and 4 the floating point 
numbers include exponent fields with a least significant bit weight equal to thirty-two 
(e.g. page 550 lines 4-5 in right column). 

Re claim 23, Chip et al. disclose in Figures 2 and 4 a multiplier to produce a 
product (e.g. output of FMUL as a*x in page 550); and an accumulator (e.g. FADD in 
Figure 2) coupled to receive the product from the multiplier, the accumulator including 
sequential elements to provide a multi-threaded capability (e.g. page 550 lines 9-14 right 
column). 

Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
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having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Claims 10-11 are rejected under 35 U.S.C. 103(a) as being obvious over Chip et aL ("The 
Coreware Methodology: building a 200 Mflop processor in 9 man months") in view of 
Debabrata et al. ("A 600 MHz half-bit level pipelined accumulator-interleaved multiplier 
accumulator core"). 

Re claim 10, Chip et al. disclose in Figures 2 and 4 the multi-threaded 
accumulator is configured to sum floating point numbers having mantissas (e.g. FADD in 
Figure 2). Chip et al. do not disclose the mantissa is in carry-save format. However, 
Debabrata et al. disclose in Figure 4 the multi-threaded accumulator is configured to sum 
floating point numbers having mantissas in carry-save format (e.g. Figure 4 and page 502 
lines 2-5 wherein the final result of accumulator is in the carry-save format). Therefore, 
it would have been obvious to a person having ordinary skill in the art at the time the 
invention is made to replace the mantissas in carry-save format as seen in Debabrata et 
al.'s invention into Chip et al.'s invention because it would enable to increase the system 
performance (e.g. abstract, section 4, and page 502 lines 5-9). 

Re claim 11, Chip et al. further disclose in Figures 2 and 4 the multi-threaded 
accumulator includes at least one intermediate register to facilitate accumulating two 
interleaved product streams simultaneously (e.g. FADD in Figure 2). 

Claims 12-14, 17-18, and 24-30 are rejected under 35 U.S.C. 103(a) as being obvious 
over Chip et al. ("The Coreware Methodology: building a 200 Mflop processor in 9 man 
months") in view of Choquette (U.S.'6,480,872). 
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Re claim 12, Chip et al. do not disclose in Figures 2 and 4 a floating point 
conversion unit coupled between the multiplier and the multi-threaded accumulator to 
convert the product from a first floating point representation to a second floating point 
representation. However, Choquette discloses in Figure in Figure 4 a floating point 
conversion unit coupled (e.g. 414) between the multiplier (e.g. 410 and 412) and the 
multi-threaded accumulator (e.g. 416) to convert the product from a first floating point 
representation to a second floating point representation (e.g. before the shifter and after 
the shifter respectively). Therefore, it would have been obvious to a person having 
ordinary skill in the art at the time the invention is made to add a floating point 
conversion unit coupled between the multiplier and the multi-threaded accumulator to 
convert the product from a first floating point representation to a second floating point 
representation as seen in Choquette's invention into Chip et al.'s invention because it 
would enable to properly producing the correct product-accumulation by shifting or 
aligning the product to the accumulation register (col. 5 lines 5-9). 

Re claim 13, Chip et al. further disclose in Figures 2 and 4 the first floating point 
representation includes an exponent field having a least significant bit weight of one, and 
the second floating point representation includes an exponent field having a least 
significant bit weight of thirty-two (e.g. page 550 lines 4-5 in right column). 

Re claim 14, Chip et al. do not disclose in Figures 2 and 4 the multi-threaded 
accumulator circuit includes at least one constant shifter to conditionally shift a mantissa 
thirty-two bit positions. However, Choquette discloses in Figure 4 a constant shifter (e.g. 
414) to conditionally shift a mantissa thirty-two bit positions. Therefore, it would have 
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been obvious to a person having ordinary skill in the art at the time the invention is made 
to add a constant shifter for shifting a mantissa thirty-two bit positions as seen in 
Choquette's invention into Chip et al.'s invention because it would enable to properly 
producing the correct product-accumulation by shifting or aligning the product to the 
accumulation register (col. 5 lines 5-9). 

Re claims 17-18, Chip et al. do not disclose in Figures 2 and 4 a constant shifter 
prior to a first intermediate register, and a multiplexor subsequent to the first intermediate 
register and an adder circuit prior to a second intermediate register; and a second 
multiplexor subsequent to the second intermediate register. However, Choquette 
discloses* in Figure 4 a constant shifter (414) prior to a first intermediate register (41 1), 
and a multiplexor subsequent (e.g. mux prior selecting operands into adder 416 on the 
left) to the first intermediate register and an adder circuit (e.g. 416) prior to a second 
intermediate register; and a second multiplexor (e.g. mux prior selecting operands into 
adder 416 on the right) subsequent to the second intermediate register (e.g. 418). 
Therefore, it would have been obvious to a person having ordinary skill in the art at the 
time the invention is made to add a constant shifter prior to a first intermediate register, 
and a multiplexor subsequent to the first intermediate register and an adder circuit prior to 
a second intermediate register; and a second multiplexor subsequent to the second 
intermediate register as seen in Choquette' s invention into Chip et al.'s invention because 
it would enable to properly producing the correct product-accumulation by shifting or 
aligning the product to the accumulation register (col. 5 lines 5-9). 



Application/Control Number: 10/071,373 Page 10 

Art Unit: 2193 

Re claim 24, it has same limitations cited in claim 12. Thus, claim 24 is also 
rejected under the same rationale as cited in the rejection of rejected claim 12. 

Re claim 25, Chip et al. further disclose in Figures 2 and 4 the accumulator (e.g. 
FADD in Figure 2) is configured to produce a present sum from the converted product 
(e.g. output of FMUL) and a previous sum (e.g. feedback from FADD) having the second 
exponent weight. 

Re claim 26, Chip et al. do not disclose a post-normalization unit to convert the 
present sum to a floating-point resultant having the first exponent weight. However, 
Choquette discloses in Figure 4 a post-normalization unit to convert the present sum to a 
floating-point resultant having the first exponent weight (e.g. feedback of register C into 
* the mux prior entering shifter 414 in Figure 4). Therefore, it would have been obvious to 
a person having ordinary skill in the art at the time the invention is made to add a post- 
normalization unit to convert the present sum to a floating-point resultant having the first 
exponent weight as seen in Choquette's invention into Chip et al.'s invention because it 
would enable to properly provide a desire format as predetermined by the system. 

Re claim 27, Chip et al. further disclose in Figures 2 and 4 the accumulator 
includes: an adder path (e.g. Figure 2). Chip et al. do not disclose an adder bypass path. 
However, Choquette discloses in Figure 4 an accumulator including an adder bypass path 
(e.g. output of 412 directly feeds to the result register 416). Therefore, it would have 
been obvious to a person having ordinary skill in the art at the time the invention is made 
to add a bypass path as seen Choquette's invention into Chip et al.'s invention because it 
would enable to increase the system performance by bypassing the alignment. 
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Re claim 28, it has same limitations cited in claim 21. Thus, claim 28 is also 
rejected under the same rationale as cited in the rejection of rejected claim 21. 

Re claim 29, it has same limitations cited in claim 22. Thus, claim 29 is also 
rejected under the same rationale as cited in the rejection of rejected claim 22. 

Re claim 30, it has same limitations cited in claim 10. Thus, claim 30 is also 
rejected under the same rationale as cited in the rejection of rejected claim 10. 

(10) Response to Argument 

A. Discussion of the rejection of claims 8-9, 15-16, and 19-23 under 35 U.S.C. 102(b) as 
being anticipated by Chip et al. ("The Coreware Methodology: Building a 200 Mflop 
Processor in 9 Man Months"). 

The applicant argues in pages 8-10 for claims 8-9 and 15 that the cited reference 
by Chip et al fails to disclose or teach interleaved operands and fails to teach multi- 
thread operations with the following two reasons: First, the Chip et al. 's disclosure of 
"effectively interleaving" a multiplier and an adder is not the same as receiving actual 
interleaved operands at the multiplier (page 9); Second, the table 2 of cited reference 
discloses an addition operation, but fails to disclose a multi-thread addition wherein the 
multiple threads are defined as sets of operands that produce different products and are 
maintained as separate entities throughout the operations performed on these sets of 
operands as they pass through the multi-threaded accumulator. 

The examiner respectfully submits that the cited reference by Chip et al. either 
inherently or expressively discloses every single limitation cited in the claimed invention. 
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In particular, the reference discloses the interleaved operands and the multi-thread 
operations (e.g. equation 1 in page 549, first paragraph right column in page 550, and 
Figure 2). The claim(s) does not define or require how the operands are interleaved but 
rather the claim(s) requires the multiplier receives the interleaved operands which can be 
seen in equation 1 page 549 and Figure 2 in page 550 wherein the interleaved operands 
are {a, e, i, m} input into the multiplier (e.g FMUL in Figure 2). For non-interleaved 
multiplication respect to equation 1, the inputs into the multiplier should be in sequence 
alphabet order as {a, b, c, e, f; g, h. . . } as seen in Figure 1 . However for interleaved 
multiplication respect to equation 1, the inputs are interleaved by 4 unit or by row into the 
multiplier as {a, e, i, m, b, f, j, n. . . } as seen in Figure 2. Therefore, the Chip et al.'s 
disclosure of "effectively interleaving" operands is the same as receiving actual 
interleaved operands at the multiplier. Further, the reference also discloses a "multi- 
threaded accumulator" in Figure 2 and table 2 below wherein the accumulator labeled as 
FADD has four different threads to handle four different accumulation as the results seen 
in table 2 below. As defined by the specification page 3 line 2 to page 6 line 12, the 
multi-thread accumulator has'multiple intermediated registers (e.g. 4 registers in Figure 2 
corresponding to 4 separated operations in table 2 below) ,to simultaneously hold partial 
result from each of the different threads (e.g. at cycle 8 and on, the accumulator can hold 
simultaneously 4 partial result as summation of product corresponding to different 
operation as seen in table 2 below). Thus, the extended table 2 of cited reference 
discloses a multi -thread addition (e.g. operation in column 7-10 is addition) wherein the 
multiple threads are defined as sets of operands that produce different products (e.g. each 
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of column 3-6 produce different products according to equation 1 in page 549) and are 
maintained as separate entities throughout the operations performed on these sets of 
operands as they pass through the multi-threaded accumulator (e.g. in extended table 2, 
each of different products is accumulated separately along the registers). 
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Table 2: extended tab 



e of latency of the interleaved scheme 



The applicant argues in page 10 third paragraph to page 11 fifth paragraph that 
the cited reference by Chip et al also fails t( the accumulator having intermediate 
registers to simultaneously hold partial results from each of the different threads " as 
cited in claim 16 and u the accumulator including sequential elements to provide a multi- 
threaded capability" as cited in claim 23 because the cited reference discloses an 
accumulator operates on four , ordinate transformations in parallel In another words, 
Chip et al. does not teach the accumulator having intermediate registers to 
simultaneously hold partial results from each of the different threads. 
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The examiner respectfully submits that the cited reference by Chip et al. clearly 
disclose "the accumulator having intermediate registers to simultaneously hold partial 
results from each of the different threads" and "the accumulator including sequential 
elements to provide a multi-threaded capability" as briefly addressed above. In addition, 
the extended table 2 of cited reference discloses the accumulator having intermediate 
registers (e.g. Figure 2 the accumulator as FADD has four registers) to simultaneously 
hold partial results from each of the different threads (e.g. the extended table 2 discloses 
four registers from columns 7-10 simultaneously holding 4 separated summations of the 
product results). The accumulator as FADD in Figure 2 accumulating input results in 
sequential order as seen in the extended table 2. The accumulator does not operate on 
four ordinate transformations in parallel as alleged by the applicant. As seen in the 
extended table 2, the addition operation is only occurred at the first column or register, 
one addition per cycle as sequentially, and then propagate the summation along the 
registers in the accumulator. 

B. Discussion of the rejection of claims 10-1 1 under 35 U.S.C. 103(a) as being obvious 
in view of the proposed combination of Chip et al. ("The Coreware Methodology: 
Building a 200 Mflop Processor in 9 Man Months") in view of Debabrata et al. ("A 600 
MHz Half-bit Level Pipelined Accumulator-Interleaved Multiplier Accumulator Core"). 

The applicant argues in pages 13-14 for claims 10-11 that the cited reference 
fails to disclose the limitations cited in independent claim 8 and further the applicant 
disagrees with the motivation to combine the references because the carry-save 
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accumulation as a "bottleneck" that affects the throughput. Thus, it is improper or 
unmotivated to combine the references as obviousness. 

The examiner respectfully submits that carry-save adder having carry-save format 
is well-known in the art for used conventionally in accumulator. The secondary reference 
by Debabrata et al. clearly disclose the carry-save accumulator in Figure 4 and under 
section 4 "Multiplier- Accumulator (MAC) Architecture". In order to improve or yield 
higher throughput, Debabrata et al. propose an interleaved pipelined MAC at the level of 
an XOR gate. In any case, the MAC is using or implementing the carry-save format for 
achieving high throughput (e.g. abstract and Figure 6). Therefore, it would have been 
obvious to a person having ordinary skill in the art at the time the invention is made to 
add a carry-save accumulation with the mantissas in carry-save format as seen in 
Debabrata et al.'s invention into Chip et al.'s invention because it would enable to 
increase the system performance in term of throughput (e.g. abstract, section 4, and page 
502 lines 5-9). Thus, it is proper and motivated to combine the references under 
obviousness to disclose claimed invention. 

C. Discussion of the rejection of claims 12-14, 17-18, and 24-30 under 35 U.S.C. 103(a) 
being obvious in view of the proposed combination of Chip et al. ("The Coreware 
Methodology: Building a 200 Mflop Processor in 9 Man Months") in view of Choquette 
(U.S. 6,480,872). 

The applicant argues in page 15 for claims 12-14, 1 7-18, and 24-30 which is 
depend on claims 8, 16, and 23 respectively that Choquette fails to teach or suggest (< the 
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interleaved operands " or a "multi-threaded accumulator coupled to the multiplier to 
receive the product" as generally cited in claims 8, 16, and 23. 

The examiner respectfully submits that the features "the interleaved operands" or 
a "multi-threaded accumulator coupled to the multiplier to receive the product" are found 
in the primary reference by Chip et al. as clearly address above wherein the primary 
reference discloses the interleaved operands and the multi-thread operations (e.g. 
equation 1 in page 549, first paragraph right column in page 550, and Figure 2). The 
secondary reference by Choquette is used to disclose the missing element as the floating- 
point conversion unit which is obvious to a person having ordinary skill in the art at the 
time the invention is made to add that missing element. 

The applicant argues in page 16 for claims 12-14, 17-18, and 24-30 that there is 
no teaching or suggestion to combine the shifter of Choquette with the matrix 
multiplication based on interleaved multiplier accumulator algorithm. Further, the 
inference that Chip et al. requires shifting or aligning to produce a "correct" product 
accumulation appears to be counter to the disclosure in Chip et al. 

The examiner respectfully submits that the primary reference by Chip et al. 
clearly disclose the interleaved multiplier accumulator algorithm as addressed clearly 
above. The missing element from the primary reference is the floating-point conversion 
unit for converting from a first floating-point representation to a second floating-point 
representation. This missing element can be found in the secondary reference by 
Choquette as shifting and/or aligning the partial product prior adding in the accumulation 
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register (e.g. col. 5 lines 5-9). As suggest by the secondary reference in column 5 first 
paragraph, either the input operand must aligned using shifter (e.g. part 414 in Figure 4) 
for properly aligned before adding in order to produce the correct result (e.g. Figures 6-7 
and col. 6 lines 45-65). 

The applicant argues in page 1 7 that the Final Office Action fails to point out any 
portions of either of the cited documents to support the benefits as u enable to properly 
provide a desire format as predetermined by the system " and "increase the system 
performance by bypassing the alignment " 

The examiner respectfully submits that the obvious motivation does not necessary 
come from either cited reference, but it can still be motivated to combine using common 
knowledge of a person having an ordinary skill at the time the invention is made. Thus, it 
is obvious to add a post-normalization process to convert to different format as necessary 
in order to use in the next process as predetermined by the system. In addition, bypassing 
the alignment during the addition would obviously increase the system performance 
because it would take less time or cycle to produce the result of addition wherein 
alignment during the addition would take a cycle or so to perform. Therefore, it would 
have been obvious to any ordinary skill in the art at the time the invention is made to add 
a bypass path in order to increase the system performance as possible. 
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(11) Related Proceeding(s) Appendix 

No decision rendered by a court or the Board is identified by the examiner in the Related 
Appeals and Interferences section of this examiner's answer. 

For the above reasons, it is believed that the rejections should be sustained. 
Respectfully submitted, 
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